{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e6685771",
   "metadata": {},
   "source": [
    "# Lesson 3: Translation and Summarization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c7109da",
   "metadata": {},
   "source": [
    "- In the classroom, the libraries are already installed for you.\n",
    "- If you would like to run this code on your own machine, you can install the following:\n",
    "\n",
    "```\n",
    "    !pip install transformers \n",
    "    !pip install torch\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97e3e7c9-1437-4784-8a21-cd200bc609a5",
   "metadata": {},
   "source": [
    "- Here is some code that suppresses warning messages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "782af222-1bea-449a-8dd4-655ad7a7b8ea",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "from transformers.utils import logging\n",
    "logging.set_verbosity_error()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bea43ec1",
   "metadata": {},
   "source": [
    "### Build the `translation` pipeline using 🤗 Transformers Library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1d46ac9-d665-4690-99a4-43b625e02114",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "from transformers import pipeline \n",
    "import torch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "014e1c26-df35-406c-8ac2-9789b011c86b",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "translator = pipeline(task=\"translation\",\n",
    "                      model=\"./models/facebook/nllb-200-distilled-600M\",\n",
    "                      torch_dtype=torch.bfloat16) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69d8f7a5",
   "metadata": {},
   "source": [
    "NLLB: No Language Left Behind: ['nllb-200-distilled-600M'](https://huggingface.co/facebook/nllb-200-distilled-600M).\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "095bd1c5-a96f-4b20-8e9c-601b0b158fd8",
   "metadata": {
    "height": 115
   },
   "outputs": [],
   "source": [
    "text = \"\"\"\\\n",
    "My puppy is adorable, \\\n",
    "Your kitten is cute.\n",
    "Her panda is friendly.\n",
    "His llama is thoughtful. \\\n",
    "We all have nice pets!\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03d9ebdf-86d8-493b-8757-74b3d1010442",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "text_translated = translator(text,\n",
    "                             src_lang=\"eng_Latn\",\n",
    "                             tgt_lang=\"fra_Latn\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "711052f5",
   "metadata": {},
   "source": [
    "To choose other languages, you can find the other language codes on the page: [Languages in FLORES-200](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)\n",
    "\n",
    "For example:\n",
    "- Afrikaans: afr_Latn\n",
    "- Chinese: zho_Hans\n",
    "- Egyptian Arabic: arz_Arab\n",
    "- French: fra_Latn\n",
    "- German: deu_Latn\n",
    "- Greek: ell_Grek\n",
    "- Hindi: hin_Deva\n",
    "- Indonesian: ind_Latn\n",
    "- Italian: ita_Latn\n",
    "- Japanese: jpn_Jpan\n",
    "- Korean: kor_Hang\n",
    "- Persian: pes_Arab\n",
    "- Portuguese: por_Latn\n",
    "- Russian: rus_Cyrl\n",
    "- Spanish: spa_Latn\n",
    "- Swahili: swh_Latn\n",
    "- Thai: tha_Thai\n",
    "- Turkish: tur_Latn\n",
    "- Vietnamese: vie_Latn\n",
    "- Zulu: zul_Latn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4ba07e3-4a5e-4bf2-86a9-498781828eca",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "text_translated"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7517649",
   "metadata": {},
   "source": [
    "## Free up some memory before continuing\n",
    "- In order to have enough free memory to run the rest of the code, please run the following to free up memory on the machine."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c16e5dad-dac0-42e4-9a87-8128a1d49b44",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "import gc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43cafb3a-51b8-4aae-929c-31d524dec530",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "del translator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61d698a7-8ae2-475e-ac46-d768c282b17c",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "gc.collect()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2fac55f",
   "metadata": {},
   "source": [
    "### Build the `summarization` pipeline using 🤗 Transformers Library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b132c646-0c6a-4c57-939a-b3015ea4b76f",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "summarizer = pipeline(task=\"summarization\",\n",
    "                      model=\"./models/facebook/bart-large-cnn\",\n",
    "                      torch_dtype=torch.bfloat16)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ca6f847",
   "metadata": {},
   "source": [
    "Model info: ['bart-large-cnn'](https://huggingface.co/facebook/bart-large-cnn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98276d66-4274-4a2f-b6a7-b4fb839b94f7",
   "metadata": {
    "height": 149
   },
   "outputs": [],
   "source": [
    "text = \"\"\"Paris is the capital and most populous city of France, with\n",
    "          an estimated population of 2,175,601 residents as of 2018,\n",
    "          in an area of more than 105 square kilometres (41 square\n",
    "          miles). The City of Paris is the centre and seat of\n",
    "          government of the region and province of Île-de-France, or\n",
    "          Paris Region, which has an estimated population of\n",
    "          12,174,880, or about 18 percent of the population of France\n",
    "          as of 2017.\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d856f193-cbf7-450b-8ae3-42287096e56f",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "summary = summarizer(text,\n",
    "                     min_length=10,\n",
    "                     max_length=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2c79f81-6baf-4f6b-95ee-b1a2072ec073",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "summary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca56abc0",
   "metadata": {},
   "source": [
    "### Try it yourself! \n",
    "- Try this model with your own texts!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "002c0464-7d37-4b50-b3fe-4099101e2c8b",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
